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The germline mutation rate determines the pace of genome evolution and is an 
evolving parameter itself". However, little is known about what determines its 
evolution, as most studies of mutation rates have focused on single species with 


different methodologies”. Here we quantify germline mutation rates across 
vertebrates by sequencing and comparing the high-coverage genomes of 151 parent- 
offspring trios from 68 species of mammals, fishes, birds and reptiles. We show that 
the per-generation mutation rate varies among species by a factor of 40, with 
mutation rates being higher for males than for females in mammals and birds, but not 
inreptiles and fishes. The generation time, age at maturity and species-level fecundity 
are the key life-history traits affecting this variation among species. Furthermore, 
species with higher long-term effective population sizes tend to have lower mutation 
rates per generation, providing support for the drift barrier hypothesis®. The 
exceptionally high yearly mutation rates of domesticated animals, which have been 
continually selected on fecundity traits including shorter generation times, further 
support the importance of generation time in the evolution of mutation rates. Overall, 
our comparative analysis of pedigree-based mutation rates provides ecological 
insights on the mutation rate evolution in vertebrates. 


Germline mutations are the proximate source of genomic innovation 
and inherited diseases*. Consequently, considerable effort has been 
spent on characterizing the molecular processes underlying these 
mutations and estimating germline mutation rates (GMRs). Mutations 
are rare events, yet the frequency at which they are introduced into 
genomes at each generation varies considerably across taxa, from 
approximately 10™ mutations per site per generation in unicellular 
eukaryotes up to approximately 107” mutations per site per genera- 
tion in multicellular eukaryotes!°. Inferring the driving forces of GMR 
evolution has important implications for understanding the mecha- 
nisms underlying mutagenesis. Several hypotheses have been proposed 
to explain variation in GMRs among lineages. Some of these invoke 
molecular mechanisms such as DNA methylation’ or microsatellite 
instability’, whereas others invoke external factors such as exposure to 
mutagenic environments’. Other studies have argued that life-history 
traits might explain some of the variation both in the prevalence of 
mutations and in the ability to repair DNA. In particular, the genera- 
tion time” and the metabolic rate” have been suggested to be key 
life-history traits that could be associated with germline mutations. 


From a long-term evolutionary perspective, the ‘drift barrier hypoth- 
esis’ proposes that lower mutation rates may reflect the increased effi- 
ciency of natural selection at reducing the occurrence of mutationsin 
species with large effective population sizes’. 

However, a lack of accurate and standardized GMR estimation 
has so far precluded testing current hypotheses of GMR evolution. 
Pedigree-based estimates of GMRs per generation have recently been 
published for a handful of vertebrate species, mainly focusing on 
humans and primates” ”. Furthermore, a recent comparative study 
of 16 mammalian species identified an effect of lifespan on somatic 
mutation rates inferred from the sequencing of intestinal crypts”. 
Nevertheless, interspecific comparisons of GMR variation remain 
restricted in taxonomic scope”, partly due to the difficulty of com- 
paring GMR estimates derived using different methodologies’. For 
example, alternative bioinformatic pipelines used in different studies 
can yield GMR estimates that vary by a factor of two, even when applied 
to the same parent-offspring trios’. This highlights the importance 
of applying consistent analytical pipelines for interspecies compari- 
sons of GMRS. We therefore generated high-depth genome sequences 
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(average coverage of more than 67x) for 323 individuals representing 
151 trios of 68 vertebrate species, including 36 mammals, 18 birds, 
8 ray-finned fishes and 6 reptiles (Supplementary Table 1). We then 
quantified species-specific GMRs across this wide range of vertebrate 
taxa using consistent bioinformatics pipelines to test long-standing 
evolutionary hypotheses on GMR evolution. 


Per-generation mutation rate variation 


We first estimated the per generation GMR (Ugeneration) for each trio (that 
is, mother, father and offspring) by comparing parental and offspring 
genomes (Fig. 1a, Supplementary Tables 2 and 3 and Supplementary 
Figs. 1-5 for details on the method). Overall, Hgeneration Varies by a fac- 
tor of 40 across all species. On average, mutation rates per genera- 
tion are higher in reptiles (average of all species 1.17 x 1078, 95% Cl of 
the mean = 5.34 x 10° to 1.80 x 10 8) and birds (average of all species 
1.01 x 10°, 95% Cl of the mean = 6.10 x 10° to 1.42 x 10°) than in mam- 
mals (average of all species 7.97 x 10°, 95% Cl of the mean = 7.04 x 10° 
to 8.90 x 10°) and fishes (average of all species 5.97 x 10°, 95% Cl of the 
mean = 4.39 x 10°’ to 7.55 x 10°). However, the difference among the four 
major classes of vertebrates is not overall statistically significant (analysis 
of variance (ANOVA): F = 1.86, P= 0.15). Furthermore, the amount of vari- 
ation IN Lgeneration AMON species tends to be higher for birds and lower for 
mammals and fishes (Fig. 1a), although this variation is arguably modest 
given large differences in life-history traits among these species (for 
example, there is a 2.8 million-fold difference in the body mass of killer 
whales and Siamese fighting fish, and there is a 93-fold difference in the 
generation time between humans and Texas banded geckos). 

Species with longer generation intervals are expected to have higher 
per-generation mutation rates due to a combination ofa larger number 
of cell divisions in spermatogenesis and more time for DNA damage to 
accumulate? “”°, For the 105 trios for which parental age was known 
at reproduction, we found a significant positive association between 
generation aNd the average parental age at reproduction (linear regression 
adjusted r = 0.14, P = 3.9 x 10°; Fig. 1b). This pattern is also significant 
for the 60 mammalian trios with known parental ages (linear regression 
adjusted r? = 0.37, P=1.6 x 10”) and for the 32 bird trios after excluding 
asingle outlier, the Darwin’s rhea (linear regression adjusted r = 0.31, 
P=0.0005). Furthermore, all three of these regressions have similar 
positive y-intercept values on the order of approximately 0.59 x 10° 
mutations per site per generation. For the trios with known parental 
ages, paternal and maternal ages at conception are strongly corre- 
lated (linear regression adjusted r = 0.77, P< 2.2 x 10™; Extended Data 
Fig. 1). However, multiple linear regression showed that the age of the 
father is the most significant explanatory variable (adjusted r = 0.15, 
P=9.3 x 10>; paternal age P= 0.018; maternal age P= 0.785). Thus, a 
stronger effect of paternal than maternal age on the mutation rate 
seems to be universal for birds and mammals due to more germline 
mutations accumulating throughout the life of the male. 

The specific types of de novo mutations (DNMs) observed across the 
151 trios are concordant with the results of previous studies of individual 
species? “15, including a ratio of transitions over transversions of 
2.3 (95% Cl on binomial distribution = 2.2-2.5) and a high proportion 
(48.5%, 95% CI on binomial distribution = 46.7-50.3%) of transitions 
from strong base pairing to weak base pairing (C:G > T:A) across all 
DNMs (Supplementary Table 4). Among C:G>T:A mutations, 42.4% (95% 
Cl on binomial distribution = 39.9-45.0%) occurred at CpG sites. The 
direction of mutations from one base to another (that is, the spectrum 
of mutation) differed significantly across vertebrate classes (y’ = 30.0, 
d.f. = 15, P= 0.012; Supplementary Table 4 and Supplementary Fig. 6). 
We also found significant differences among vertebrate classes for A >C 
mutations (y? = 16.2, d.f. = 3, P= 0.001) and for C > A mutations (¥ = 8.8, 
d.f. = 3, P= 0.032). In particular, fish species exhibit significantly fewer 
A>C mutations and significantly more C > A mutations than the other 
vertebrate classes. However, this mutation pattern does not appear to 
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be associated with genome-wide CG content, as overall, the CG con- 
tent of fishes is similar to that of mammals and birds and lower than 
that of reptiles (Supplementary Fig. 7). Finally, there is no significant 
difference between the classes of species in the percentage of all muta- 
tions located in CpG sites (¥ = 4.3, d.f. = 3, P= 0.23), implying that high 
mutation rates at CpG sites are a conserved feature across vertebrates. 


Variable male-driven evolution 


Inmammals and birds, the much larger number of germ-cell divisions 
per generation inthe male germ line leads to the expectation of amale 
mutation rate bias, coined the ‘male-driven evolution hypothesis”°”’”. 
However, very little is known about interspecific variation in the mag- 
nitude of the male-to-female ratio of the contribution of germline 
mutations (a). Previous studies have reported high a values in mam- 
mals (ranging from 1.0 to 20.1)” and birds (ranging from 3.9 to 6.5)” 
based on indirect estimates obtained by comparing rates of sequence 
divergence on the autosomes and sex chromosomes (see Extended 
Data Fig. 2 and Supplementary Table 5). However, other evolutionary 
forces can also act differently on the X chromosome and autosomes. 
For example, stronger natural selection on the X chromosome could 
lead to lower than expected divergence from the common ancestor, 
upwardly biasing estimates of a”. Furthermore, estimates of a derived 
in this way are averages over a phylogenetic branch and may thus dif- 
fer from the contemporary species a. Here we directly quantified a 
by assigning the parental origin of the DNMs. Around 48% of all 3,034 
DNMs across all of the trios could be phased to their parental origin 
(see Supplementary Table 6 for positions of all mutations). Owing to 
the relatively small number of mutations in each trio (Supplementary 
Table 2), we analysed male bias after taxonomically grouping the spe- 
cies into classes and orders (Fig. 1c). 

Mammals showed a male bias of « = 2.3 (95% CI = 2.0-2.6). In general, 
our a estimates are in line with previous estimates derived for similar 
species based on genome alignments?” For example, we found that 
among mammals, primates have the largest male bias with a = 3.8 (95% 
Cl = 2.6-5.7), similar to what was previously reported for several species 
belonging to this group” "23233. Rodents have the lowest male bias 
among the mammals in our study, with a = 2.1 (95% CI = 1.4-3.1), con- 
sistent with a previous study based on mouse pedigrees™. This pattern 
can be explained by the short generation time of rodents, which leads 
to a smaller difference in cell divisions between the male and female 
germ lines*®. However, the variation in a is relatively small given the 
variation in generation time among species (for example, between 30 
years for humans and 8 months for the short-tailed opossum). Thus, 
an alternative hypothesis to explain the observed a would bea higher 
contribution of DNA damage, specifically in the male germ line for 
species with large generation times”. 

Birds also showed an overall high male bias with a = 3.2 (95% CI = 2.5- 
4.1), although there is appreciable variation among different lineages. 
In particular, passerine birds and waterbirds (Pelecaniformes and Sphe- 
nisciformes) exhibited the largest male bias, both with a = 7.6 (95% 
Cl = 4.3-13.5 for Passeriformes and 95% CI = 3.5-16.3 for Pelecaniformes 
and Sphenisciformes). High levels of male-male competition will lead 
to anincreased amount of sperm being produced and faster sperm turn- 
over, which would be expected to cause a higher male bias**. Indeed, 
many passerine birds have large cloacal protuberances” and relatively 
heavy testes’®, which are often used as proxies of sperm competition”. 
For instance, in two of the passerine species included in our study, 
testes represent between 1.2% (for Turdus merula) and over 2% (for 
Saxicola maurus) of the total body mass”. Moreover, extra-pair mat- 
ing is common in many passerine birds“ as well as in penguins“, also 
indicating a high level of sperm competition. Overall, our results lend 
further support to the male-driven hypothesis in birds and mammals”. 

By contrast, reptiles have a relatively small male bias with a = 1.5 
(95% Cl = 1.2-1.8), whereas fishes appear to have a greater proportion of 
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Fig. 1| Variation in GMRs and their association with life-history traits 
across 68 vertebrate species. a, The phylogenetic tree of 68 species is based 
on UCE data and is calibrated with fossil data at 14 nodes (see Methods; 
Extended Data Fig. 3 and Supplementary Fig. 8). The average pedigree-based 
mutation rates per generation for each species, which are represented by the 
squares, show 40-fold variation among species. The 95% binomial confidence 
intervals are shown, and individual trios are represented by round points. See 
Supplementary Table 8 and Extended Data Fig. 4 for a comparison with 
published estimated rates of closely related species. b, The per-generation 
mutation rate is significantly associated with the average parental age at the 
time of offspring production across all individuals with known paternal age 
(105 trios), using linear regression. For birds, this relationship is statistically 


mutations of maternal origin (a = 0.8), although the 95% Cl overlaps 1 
(95% CI = 0.5-1.4). This variation among vertebrate classes can be 
explained by differences in the process of gametogenesis. Although 
most birds and mammals produce sperm cells continuously through 
time”, reptiles and fishes tend to be seasonal breeders, producing 
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significant after removing a single outlier, the Darwin’s rhea. c, The male-to- 
female contribution ratio (a) is estimated for groups of vertebrates having at 
least 30 mutations phased to their parents of origin in each group. The highest 
male bias (7.6:1) is found in two bird lineages, whereas fishes and reptiles show 
negligible male bias. The data are represented with 95% confidence intervals 
based on the binomial variance. The silhouette of Syngnathus scovelli was 
created byJ.S. All other silhouettes are from PhyloPic (http://phylopic.org), 
except for one of the silhouettes of Sarcophilus harrisii, which was created by 
S. Werning, and the silhouette of Pan troglodytes, which was created by T. M. 
Keesey (vectorization) and T. Hisgett (photography); both are available under a 
CC-BY 3.0 licence (https://creativecommons.org/licenses/by/3.0). 


sperm cells during a limited period before the mating season*® *, which 
will tend to reduce differences in cell division numbers between males 
and females, leading to more equal a. Moreover, female fishes are usu- 
ally synchronous ovulators**, producing hundreds to millions of eggs 
at the same time followed by a proliferation of new oogonia“. This 
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implies that females continually produce germ cells throughout their 
life, which would further reduce the difference in cell division number 
between males and females. 

Species with lower sex bias also exhibited a larger proportion of 
shared mutations between siblings, with 12.0% (s.e. of 6.5%) of shared 
mutations between siblings for fish and 8.1% (s.e. of 5.3%) for reptiles 
compared with 1.5% (s.e. of 0.7%) for mammals and 2.2% (s.e. of 1.4%) 
for birds (Supplementary Table 7). An explanation for the repeated 
occurrence of those mutations is that they appear during the primor- 
dial germ cell specification in one of the parents*’. The occurrence of 
primordial germ cell specification mutations is independent of parental 
sex. Consequently, a higher number of primordial germ cell specifi- 
cation mutations in some vertebrate groups could be an alternative 
explanation for the lower male-biased contribution to DNMs. 


Yearly mutation rates 


To use our results for phylogenetic dating and to compare the speed of 
evolution among species with different generation times, we needed 
estimates of yearly mutation rates. Different methods have been 
used in the literature to estimate yearly mutation rates. When sample 
sizes are small, yearly rates are commonly inferred by dividing the 
per-generation rate by the average age of the parents (or the generation 
time if parental age is unknown). However, this method implicitly 
assumes a constant accumulation of mutations from conception to 
reproduction, that is, the regression line of mutation rate on parental 
age should run through the origin. Our results (Fig. 1b), as well as pre- 
vious studies of mice, humans and cats”°**, imply that parents always 
carry aminimum number of mutations in their gametes regardless of 
their age. This could lead to the yearly rate being overestimated fora 
given species ifthe sampled trio (or trios) had young parents compared 
with the average generation time for that species”. Consequently, we 
built a model that incorporates this mutational contribution at birth. 
Unfortunately, small per-species sample sizes in our dataset precluded 
modelling the effects of parental age separately for each species. How- 
ever, we observed very similar intercepts and slopes across taxonomic 
groups, allowing us to fit a joint model for all species. A Poisson model 
explaining the number of mutations in each trio using a mutational 
contribution at birth and a weighted average of paternal and maternal 
age fits the data surprisingly well. To incorporate interspecific variation 
in male bias, we used the per-species fraction of paternal and maternal 
mutations estimated using read-backed phasing to weigh the aver- 
age of the parental ages for each trio. Using this model, the number of 
predicted mutations matches the observed number with an overall z? 
of 0.73 (mammalian z’ = 0.58, avian r? = 0.51; Supplementary Note 1). 
The yearly rates inferred with the naive method of dividing the 
per-generation rate by parental age (Hyeary) and the rates obtained with 
our model ([yearty modettea) yielded similar results (Pearson’s correlation 
r’ =0.40, P= 0.002), and for 55% of the species, Hyeary falls within the 
95% confidence interval of the [yearly modetlea AS Expected, the estimates 
showed the greatest differences for those species in which the parents 
reproduced far from the generation time, with the model-based esti- 
mates being smaller for those species that reproduced earlier than their 
generation time and larger for those species that reproduced later than 
their generation time. For example, the pigs in our dataset reproduced 
at around 6 months of age, which is more than 5 years earlier than the 
estimated generation time of this species. Thus, Hyeary = 8.64 x 10° 
mutations per site per year was potentially overestimated compared 
with the Hyeariy modelled = 1.05 x 10° mutations per site per year at the gen- 
eration time. Conversely, the yearly rate of the Texas banded gecko 
was potentially underestimated at flyea1y = 3-17 x 10° mutations per site 
per year using the reproductive age of 2 years of age from our dataset, 
whereas the modelled rate was flycarty modetied = 1.96 x 10° mutations per 
site per year at a generation time of between 3 and 4 months. Both the 
naive method and the modelled method have been used in the literature 
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to estimate yearly rates and both have caveats owing to the underly- 
ing assumptions they require. Bearing this in mind, we decided to use 
yearly modelled fOr the current analysis as we believe that this measure is 
more representative of the yearly rate at the generation time for each 
species (estimated yearly rates are provided in Supplementary Table9 
for comparison). 

The estimated average [year modeled Varies more than 120-fold among 
species (Supplementary Note 1 and Supplementary Table 9), with the 
highest [yearly modelled Estimated for the Texas banded gecko at 1.96 x 10° 
mutations per site per year (95% CI = 1.23 x 10 °to 2.83 x 10-8), whereas 
the lowest [yearly modelled CStimates were obtained for two bird species, 
the griffon vulture and the snowy owl, both with less than 0.18 x 10° 
mutations per site per year (snowy OWI: [yearly modelled = 0.16 x 10°°, 95% 
CI = 0.05 x 10°’ to 0.34 x 10°; griffon vulture: [yearly modelled = 0-17 x 10°, 
95% CI = 0.07 x 10° to 0.32 x 10°). This large amount of interspecific 
variation is remarkable given that pedigree-based GMR estimates of 
individual species assessed by previous separate studies only showan 
approximately 16-fold variation in yearly GMRs**~, Within primates, we 
observed atwofold variation across species and found a general trend 
for rates to be higher in the New World monkeys than inthe great apes. 
This is consistent with previous independent estimates from primates” 
and supports the ‘hominoid slowdown’ hypothesis? °°. 

Next, we used {yearly modelled tO assess the strength of the association 
between GMRs and long-term evolutionary substitution rates. To obtain 
an estimate of the long-term substitution rate, we used the alignment of 
ultraconserved elements (UCEs), which are more likely to align among 
taxonomically distant species, plus 1,000 bp of flanking regions on each 
side of the UCE sequences, which will more closely reflect the neutral 
substitution rate”. We founda significant positive correlation between 
yearly modelled aNd the UCE substitution rate after excluding domesticated 
species owing to their overall much higher yearly mutation rates (see 
the following section; phylogenetic generalized least squares (PGLS): 
adjusted r = 0.23, P= 0.002; Fig. 2a). This pattern is especially pro- 
nounced for mammals (PGLS: adjusted r? = 0.44, P= 0.0008), even 
after removing the two outliers (PGLS: adjusted r° = 0.32, P= 0.009). 
We also found a significant relationship between {yearly modetiead and the 
long-term substitution rate inferred using whole-genome alignments 
(PGLS: adjusted 7 = 0.12, P = 0.02; Fig. 2b). 


Life-history traits shape GMR variation 

To test various hypotheses relating to the causes of GMR variation 
among species, we tested for associations between the modelled 
mutation rate per generation (Lgeneration modelled) ANd life-history traits 
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Fig. 3 | Predictors ofinterspecific variation in GMRS. a-c, Significant 
positive associations are found using phylogenetic regression (PGLS) between 
the modelled per-generation mutation rates and three life-history traits: 
species-specific mean generation time (a), age at sexual maturity (b) andthe 
number of offspring per generation (c). In total there are 55 species with 
modelled per-generation rates, including 32 mammalian and 15 avian species. 
The box plot inc represents the median, the interquartile range and the 
maximum and minimum excluding outliers. d, Species-specific average 
per-generation mutation rates are negatively associated with the harmonic 
mean of the effective population size (N,) over the past 1 million years, using 
phylogenetic regression (PGLS). 


including mating system (monogamy versus polygamy), maturation 
time, body mass, longevity, fecundity and the generation time (Supple- 
mentary Table 9). We used the [generation modelled inStead Of the [generation AS 
the former is less dependent on the age of the parents and is more rep- 
resentative of the rate at generation time for a given species. Although 
taking into account phylogenetic relatedness, many of these traits 
are significantly associated With [generation modelled iNnCluding the genera- 
tion time (PGLS: adjusted r° = 0.15, P= 0.002; Fig. 3a), the maturation 
time (PGLS: adjusted r° = 0.18, P= 0.0006; Fig. 3b) and the number of 
offspring per generation (PGLS: adjusted r’ = 0.10, P= 0.013; Fig. 3c). 
Species with a higher number of offspring per generation also showed 
significantly lower [generation modelled When considering only mammalian 
species (PGLS: adjusted r? = 0.17, P= 0.011), but this relationship was 
not significant for birds (PGLS: adjusted z? = -0.066, P= 0.720). Collec- 
tively, these traits explain almost 18% of the variation in [generation modelled 
(multiple PGLS: adjusted r° = 0.18, P= 0.004). The other life-history 
traits that we tested, including longevity, mating strategy and body 
mass, are not significantly associated With [generation modelled (See Extended 
Data Fig. 7). 

Another key parameter for species evolution is the effective popula- 
tion size (N,), which impacts genetic drift and the efficacy of selection. 
To investigate the effect of N, ON [generation modelled AN to test the drift bar- 
rier hypothesis’, which predicts the evolution of higher mutation rates 
in species with small N,, we calculated N, using the pairwise sequentially 
Markovian coalescent method based on one randomly selected father 
per species. To avoid circularity, we estimated N, based onthe substitu- 
tion rate calculated from the UCE alignment (Supplementary Table 9). 


a b 
P = 0.0015 P = 0.085 
ee NS 
FT ———_ -_———} 
2.0 z 20 
Ze 23 
D 5 GE 
avis a 215 
af a 3 
2 o> 
ee ee 
ca 1.0 co 1.0 
2 D ea e 
go s EF 
F ° 28 8 
0.5 3 5 05 A 
= = 


Yes 


No 


Domestication status 


== 


Yes 


No 


Domestication status 


@Mammals e@Birds @Fishes © Reptiles 


Fig. 4| The yearly GMRsare higher in domesticated species than in 
non-domesticated species. a, Yearly GMRs are significantly higher in 
domesticated or farmed species than in wild species (using phylogenetic 
regression (PGLS) ona total of 68 species). b, Using the modelled mutation rate 
instead (using phylogenetic regression (PGLS) ona total number of 55 species) 
shows that there is no difference in yearly GMRs between domesticated and 
non-domesticated animals, suggesting that this difference is mainly driven by 
the shorter generation time of domesticated species. The box plots represent 
the median, the interquartile range and the maximum and minimum excluding 
outliers. 


Indeed, if N, was estimated using the pedigree-based mutation rate, 
a stronger correlation might arise between N, and the mutation rate 
(see Extended Data Fig. 8). We found a significant negative association 
between [generation modelled aNd the harmonic mean N, per species over the 
past 30,000-1,000,000 years (PGLS: adjusted r° = 0.08, P= 0.020; 
Fig. 3d) as would be expected under the drift barrier hypothesis. This 
relationship is mainly driven by mammals (PGLS: adjusted r? = 0.31, 
P=0.0006), a signal that is also observed when using the harmonic 
average N, over a smaller timescale (30,000-130,000 years; PGLS: 
adjusted r?= 0.10, P= 0.04, Extended Data Fig. 8). The most appro- 
priate timeframe used to estimate N, depends on the evolutionary 
time necessary for the mutation rate to adapt to changes in N,. How- 
ever, the pairwise sequentially Markovian coalescent method cannot 
accurately estimate recent N,. To overcome this limitation, we also 
estimated N, as 7/4, with nucleotide diversity (7r) and the substitu- 
tion rate per site per generation (u) estimated from the UCE align- 
ments. This results in a similar negative association between N, and 
Hgeneration modelled (linear regression: adjusted r° = 0.83, P= 2.2 x10; 
Extended Data Fig. 9), further supporting the drift barrier hypoth- 
esis. However, caution should be taken as N, estimates rely on gen- 
eration times inferred from contemporary observations, whereas 
generation times could conceivably have changed over evolutionary 
timescales. Furthermore, population size depends negatively on the 
generation time (PGLS N, in log scale: adjusted r = 0.20, P= 0.0004). 
Therefore, a negative association between N, and u could potentially 
be driven by a large effect of the generation time on per-generation 
mutation rates. 


High yearly rates in domesticated species 

Domestication imposes strong artificial selection, recurrent genetic 
bottlenecks or both. Our dataset includes 22 domesticated or semi-wild 
species that have been bred in captivity for many generations. When 
using the naive method of dividing the per-generation rate by the 
parental age, these species show significantly higher Hyeary than the 
non-domesticated species (PGLS: adjusted r? = 0.13, P= 0.0015; Fig. 4a). 
The higher mutation rates of domesticated animals are likely due to 
strong artificial selection for traits such as shorter generation times. 
Indeed, using Hyearty modetiear We found no difference between domes- 
ticated and non-domesticated species (PGLS: adjusted r? = 0.037, 
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P=0.08; Fig. 4b). Consequently, the higher yearly mutation rate 
observed in domesticated species is more likely to be explained by 
the lowering of reproductive age associated with domestication rather 
than by an inherent change to the mutational process caused by relaxed 
selection on the mutation rate due to small population sizes and bot- 
tlenecks associated with domestication"? ®. 


Conclusions 


Here we analysed pedigree-based GMR variation in an unprecedentedly 
broad phylogenetic context. We showed that there is a consistent male 
bias in mammals and birds, whereas reptiles and fish exhibited more 
evenly matched contributions of DNMs between parents. This could be 
duetocontrasting mutagenic processes, suchas differences in male and 
female germline cell division observed in mammals and birds, or differ- 
ences among species in the proportion of DNMs occurring in primordial 
germ cell specification versus in the parental germ lines. Our results also 
support the drift barrier hypothesis, as we found a negative association 
between the per-generation mutation rate and effective population 
size. Moreover, our results suggest that an appreciable proportion of 
the variation in the GMR can be explained by life-history traits, including 
maturation time andthe number of offspring per generation. Our study 
also highlights the importance of the generation time, as illustrated 
by the particular case of domesticated animals, in which exceptionally 
high yearly mutation rate estimates can be explained by artificially 
induced short generation times. In addition, some of the trio samples in 
our study were collected from captive animals at zoos or conservation 
centres. These populations might have different generation times than 
those in the wild, which could potentially introduce biases into some 
of our mutation rate estimates. Future studies should focus on wild 
pedigree samples, which can be accessed from long-term conservation 
and monitoring programmes”. 
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Methods 


Samples 

Samples were collected from zoos, zoological museums, research 
institutes and farms from all over the world. Samples were provided 
from collaborators for research that was undertaken at the Natural 
History Museum of Denmark, permit 2020-12-7186-00733 from the 
Danish Ministry of Environment and Food, and when applicable, CITES 
Certificate of Scientific Exchange number DKO03. Genomic DNA was 
extracted using DNeasy Blood and Tissue Kits (Qiagen) following 
the manufacturer’s instructions. BGlseq libraries were built in China 
National GeneBank (CNGB), Shenzhen, China, and whole-genome 
paired-end sequencing (read length 2 x 100 bp) were performed on 
the BGISEQ5S00 platform. We aimed for 60-80 raw sequence cover- 
age per sample. A total of 68 species for which a reference genome 
was available were retained in the final dataset, representing 151 
trios for which whole blood or other tissue material was available for 
DNA extraction and for which parentage had been genetically deter- 
mined“. Information on the samples is provided in Supplementary 
Table 1. 


GMR estimation 
We applied a similar bioinformatic analysis pipeline to our previous 
study of rhesus macaques”. Raw reads were trimmed with SOAPnuke fil- 
ter®. The mapping was conducted with BWA-MEM version 0.7.15 (ref. °°). 
The versions of the reference genomes for each species are providedin 
Supplementary Table 9. A post-mapping step removed any reads map- 
ping to multiple regions of the genome as well as duplicated reads using 
Picard MarkDuplicates 2.7.1. We called variants for each individual using 
HaplotypeCaller in BP-RESOLUTION mode with GATK 4.0.7.0 (ref. “). 
This mode returns a genotype quality and depth for all positions of the 
genome, not only the polymorphic sites. As recommended by GATK 
best practices, GenomicsDBImport combined all gVCF files per species 
into a single file and GenotypeGVCF applied a joint genotyping of 
all samples within a given species (see Supplementary Table 3 with 
details of raw sequences coverage, mapping quality, and coverage 
after mapping and variant calling). Similar filtering methods to those 
in our previous study were then applied to detect DNMs”. Therefore, 
each trio was filtered as followed: 

(1) For site filtering, the variant positions were filtered with the follow- 
ing parameters: QualByDepth (QD) < 2.0, FisherStrand (FS) > 20.0, 
RMSMappingQuality (MQ) < 40.0, MQRankSum < —2.0, MQRank- 
Sum > 4.0, ReadPosRankSum < -3.0, ReadPosRankSum > 3.0 and 
StrandOddsRatio (SOR) >3.0 according to previously tested filters”. 

(2) For Mendelian violations, variants that deviated from Mendelian 
inheritance were selected using GATK SelectVariant and refined with 
anRscript to keep only sites in which both parents were homozygous 
for the reference allele (HomRef), and the offspring was heterozy- 
gous (Het). 

(3) For allelic balance filter, in the case of a DNM, approximately 50% 
of the reads in the offspring should support the alternative alleles. 
Our allelic balance filter cut-off was 30-70% of the reads supporting 
the alternative allele, similar to previous studies”**>*, 

(4) For depth filter (DP), only positions with a DP > 0.5 X Maepin and 
DP <2 x Maeptn fOr each individual were kept, with Maepin being the 
average depth of the trio. This strict DP filter minimized the effects 
of sequencing errors in regions of low sequencing depth and mis- 
mapping errors in high-coverage regions. 

(5) For genotype quality filter (GQ), to ensure that only high-quality 
genotypes were retained for the analysis of trios, we removed all 
sites where one individual of the trio had a GQ < 60 (see Supplemen- 
tary Fig. 2 for a comparison of various GQ thresholds on a subset 
of species). 

In addition, we called variants with bcftools (version 1.2)” in the 
region of the candidate DNMs and removed the sites that appeared as 


false-positive calls (that is, at least one parent had the same variant as 
the offspring or the offspring had no variant). The number of candidates 
discarded varied among species (Supplementary Table 2). This quality 
control step produced similar results to a manual check with IGV®. 
Moreover, calling variants with different variant callers has been shown 
to beanefficient method to reduce false-positive calls’. All positions of 
DNMsare provided in Supplementary Table 6. In addition, we showed 
that sample type, reference genome quality and mapping quality can 
affect the results on the number of candidates, the false-positive rate 
and false-negative rate (FNR), yet, the estimated mutation rates are 
not affected (Supplementary Figs. 3-5). 

To estimate per-generation rates, we divided the number of candidate 
DNMs, without the apparent false-positive candidates, per the callable 
genome. A site was considered callable when it passed the same filters as 
the polymorphic sites, that is, when both parents were HomRef (filter 2) 
and the three individuals passed the depth filter (filter 4) and the geno- 
type quality threshold (filter 5). On the sites considered callable, we 
applied a correction for the FNR, that is, the proportion of sites where 
true DNMs will not be called as such. Two methods have been used 
in the literature to estimate FNR: one is the simulation of mutations 
and the other is a correction on the filters that are not accounted for 
in the callable genome. As in our previous study of GMR”, we used 
the latter method, which is more conservative. This corrected for the 
remaining filters that can only be applied on polymorphic sites, such 
as the site filters and the allelic balance filter (filter 2). We estimated 
the proportion of sites that would be filtered away by the site filters on 
the parameters following a known distribution (FS, MQRankSum and 
ReadPosRankSum), and the expected sites filtered away by the allelic 
balance filter as the number of true heterozygote sites (one parent 
HomkRef, the other parent HomAIt and their offspring Het) outside the 
allelic balance threshold. The mutation rate per site per generation 
was then estimated per trio aS Ugeneration = DNMs/((1 - FNR) x 2 x CG). 
We estimated the 95% binomial confidence interval per species using 
the binconf() function in R, with the default Wilson score. 

To calculate yearly rates (Hyeary), we divided the per-generation rate 
by the average age of the parents at the time of reproduction weighted 
by the relative contribution of each parent (inferred with a for 105 
trios) or by the generation time (for 46 trios without parental ages). 
The resulting 1,.a1y estimates were averaged per species (for 29 spe- 
cies with multiple trios available). These yearly rates are dependent 
on the age of reproduction of the parents. Therefore, to calculate a 
yearly rate at generation time, we first modelled how the mutation 
rate of atrio was affected by the weighted average of the parental ages 
(using the paternal fraction estimated for that species as a weight). We 
then extended the model to fit how each species deviated from the 
average and used this to correct for differences between the observed 
reproductive age in our dataset and the expected generation time of 
aspecies (see Supplementary Note 1). With this, we estimated a new 
yearly modelled AN a [generation modelled that are More representative of the 
rate at generation time for each species. 


Phylogenetic analysis 

The phylogeny was built based on two sets of UCEs: 5,472 baits for 5,060 
UCEs in tetrapods” and 2,628 baits for 1,314 UCEs in acanthomorphs”. 
We used the Phyluce software” to locate the probes in the reference 
genomes of our 68 species with 6 additional species contained in our 
original dataset. We extracted a flanking region of +1,000 bp for each 
probe and aligned them with Mafft aligner version 7.470 (ref. ”). We 
then created a 75% completion matrix, that is, each alignment contains 
at least 75% of the taxa (55 species), resulting in 63 alignments from the 
acanthomorph set and 2,742 probes from the tetrapod set (all align- 
ments are available on Figshare). A phylogenetic tree was built using 
1Q-TREE version 2.0.3 (ref. ”), with the appropriate substitution model 
inferred for each of the 2,805 alignments, a maximum likelihood tree 
search and 1,000 bootstrap replicates. To validate our tree, we also 


estimated a second tree based on a MultiZ alignment to the human 

genome and obtained similar results (Extended Data Fig. 9). The phylo- 

genetic tree was calibrated to absolute time using the chronos function 

of the ‘ape’ package in R, with asmoothing parameter lambda of 0 and 

a‘relaxed’ model”. Fourteen nodes were calibrated following previ- 

ously published calibrations. The robustness of the tree was assessed 

by removing each node independently (see Extended Data Fig. 3). 

(1) Actinopterygii/Sarcopterygii: divergence time 416 million years 
ago (Ma), upper bound 425.4 Ma” 

(2) The first node in the Actinopterygii group: divergence time 
378.2 Ma” 

(3) Sauropsida (birds and reptiles)/Synapsida (mammals): divergence 
time 313.4 Ma” 

(4) Archosauria (birds)/Testudines: divergence time 260 Ma” 

(5) The basal nodes of the Lepidosauria: divergence time 222.8 Ma” 

(6) First mammalian node, Eutheria/Metatheria: divergence time 
160.7 Ma” 

(7) Galloanserae/Neoaves: divergence time 66 Ma” 

(8) Glire/Primates: divergence time 61.7 Ma” 

(9) Basal gekkotan node: divergence time 54 Ma8? 

(10) Passeriformes/Psittaciformes: divergence time 51.81 Ma™ 

(11) Cynoglossidae/Paralichthyidae: divergence time 50 Ma” 

(12) Sus scrofa/other Cetartiodactyla: divergence time 48.5 Ma” 

(13) Canidae/Arctoidea: divergence time 37.1 Ma” 

(14) Hominoidea/Cercopithecoidea: divergence time 23.5 Ma” 


Mutational spectrum and sex bias 

To analyse the spectrum of mutation, we grouped the trios into higher 
taxonomic levels, that is, mammals, birds, fishes and reptiles. Thus, 
the percentages reported are based on the total candidate mutations 
from each group of species. We explored the genomic context of the 
mutations from a C or a G base to determine whether they were located 
in CpG sites (respectively followed by a G or preceded by a C) (see Sup- 
plementary Table 4). We phased the DNMs to their parental origin using 
the read-backed phasing method described previously (GitHub: https:// 
github.com/besenbacher/POOHA)®. This method uses the read-pairs 
containing a DNM and another heterozygous variant to determine 
the parental origin of the mutation when the heterozygous variant 
is present in both the offspring and one of the parents. The phasing 
allowed us to identify parental biases in the contribution of the DNMs by 
grouping multiple species to increase the number of phased mutations 
and obtain a minimum of 30 phased mutations per taxon. From this 
analysis, we omitted the Egyptian roussette (Rousettus aegyptiacus), 
Chinese tree shrew (Tupaia belangeri), griffon vulture (Gyps fulvus), 
blue-throated macaw (Ara glaucogularis), snowy owl (Bubo scandiacus) 
and Darwin’s rhea (Rhea pennata), as these could not be grouped with 
another monophyletic clade. To quantify the effect of parental age, a 
linear regression between the per-generation mutation rate and the 
average parental age at the time of reproduction was implemented 
using the Im function in R. Multiple linear regression was also used to 
identify whether paternal or maternal age was the strongest predictor 
of the empirical mutation rate. 


Life-history trait analysis 

We tested the effect of various life-history traits (fitted as continuous 
and discrete variables) on the yearly rate for each species using PGLS 
analysis in the R package ‘caper’ (see Supplementary Table 9 for details 
about each life-history trait). 


Effective population size 

We used pairwise sequentially Markovian coalescent (PSMC) mod- 
els to estimate the effective population size of each species™. Fastq 
sequences were obtained using bam format aligned sequences of one 
randomly selected father per species and were converted into fastq 
format using samtools mpileup command and vcf2fq. As recommended, 


the minimum depth was set to one-third of the average for the sample 
and twice the average for the maximum. For mammals, fish and reptiles, 
the parameters of the PSMC were set to -N25 for the maximum number 
of iterations of the algorithm, -t15 as the upper limit for the time to the 
most recent common ancestor, -r5 for the initial 0/p value, and finally 
the atomicintervals -p of 4 + 25 x 2 + 4 + 6. These parameters were used 
previously for PSMC analysis of various species, including primates***, 
cetaceans®, Felidae”, fishes®*** and turtles”. For birds, we used differ- 
ent parameters according to the literature with -N30 -t5 -r5 (ref. °”). 
Finally, to simulate the history inferred by PSMC, we parameterized the 
generation time and the mutation rate inferred from the UCE alignment. 
We then explored the effect of the harmonic mean N, over windows 
of 30,000 years to 1,000,000 years. We also compared N, estimated 
obtained with this method with those estimated based on N, = 71/4. 
Nucleotide diversity (7) was calculated using ANGSD™. This approach 
was implemented in three consecutive steps. From the alignment files, 
a global estimate of the site frequency spectrum was inferred using a 
maximum likelihood method, then the empirical 7 value was estimated 
per site, and finally, a sliding window approach was used to estimate 77 
for each species. We used a window size of 50 kb and a step size of 10 kb 
together with an average pairwise estimation of the r to obtain global 
estimates of 7. This analysis was restricted to unrelated individuals 
from each species, which corresponded to the 2 unrelated parents for 
55 species, between 3 and 7 individuals for 10 species, and 3 species were 
excluded from this analysis as the parents were first-degree relatives. 


Reporting summary 
Further information on research design is available in the Nature Port- 
folio Reporting Summary linked to this article. 


Data availability 


Whole-genome sequences of all species except humans are accessi- 
ble in the National Center for Biotechnology Information under the 
BioProject ID PRJNA767781. The human sequences are available on 
request to L.A.B. and should be used only for GMR studies, based on 
the participant’s request. The alignments for the UCE tree are available 
on Figshare (https://doi.org/10.6084/m9.figshare.19221693.v1). All 
animal silhouettes are from PhyloPic (http://phylopic.org/), except for 
the silhouette of S. scovelli, which was created by J.S. The silhouette of 
P. troglodytes was created by T. M. Keesey (vectorization) and T. Hisgett 
(photography), and the one of S. harrissi silhouettes was created by 
S. Werning; both are available under a CC-BY 3.0 license (https://crea- 
tivecommons.org/licenses/by/3.0/); the other silhouettes are available 
under a Public Domain Mark 1.0 licence. 


Code availability 


The bioinformatics pipeline to analyse the genomes and all other data 
analyses are available on GitHub (https://github.com/lucieabergeron/ 
vertebrate rate). 
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Extended Data Fig. 1| Association of parental ages. Maternal and paternal ages are significantly positively correlated for the 105 trios with known parental age 
at reproduction (linear regression; adjusted r? = 0.77, F = 342.3 on 1and 103 DF, p < 2.2 x 10%). 
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Extended Data Fig. 2 | Comparison of published male bias estimates (a) 
using genome alignments and our male bias estimates (modified Fig. 1c of 
the main text). The yellow points are a estimates from Wilson Sayres et al.”8, 
and the purple points are a estimates from Wu et al.”. Most of the common 
species reveal similar estimates with overlapping 95% confidence intervals. 
However, the estimates of a based on genome alignments are generally lower for 
dogs and cats than our estimates, yet the pedigree-based estimate of a for cats 
(Wang et al.”°; green point) is similar to our estimate. See also Supplementary 
Table 5. The barplots represent male biases estimated by clustering different 
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species per group (to havea minimum of 30 phased mutations per group) and 
the 95% confidence intervals were based on the binomial distribution. The 
silhouette of Sygnathus was created byJ.S. All other silhouettes are from PhyloPic 
(http://phylopic.org), except one of the silhouettes of Sarcophilus harrissi, 
which was created by S. Werning, and the silhouette of Pan troglodytes, which 
was created by T. M. Keesey (vectorization) and T. Hisgett (photography); 
both are available under a CC-BY 3.0 licence (https://creativecommons.org/ 
licenses/by/3.0). 
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Extended Data Fig. 3 | Robustness of the calibration. We compared the adjusted r? = 0.91, F = 9416 on1 and 950 DF, p-value: < 2.2 x 10°). However, some 
estimated substitution rates using the 14 initial calibration points with the of the calibration points hada stronger impact on the estimated substitution 
inferred substitution rates using only 13 calibration points (with14iterationsto rates. For instance, removing the two bird nodes (7 and 10), the gekko node (9), 
remove each calibration node one by one). We found astrong relationship the Canidae/Arctoidea node (13) and the Glires/Primate node (8) altered some 
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Extended Data Fig. 4 | Per-generation mutation rates (similar to Fig. 1a) 
including published data on closely related species. For each species, the 
colored squares represent the average per-generation observed rate, along 
with the 95% confidence intervals based on the binomial distribution, and the 
black points represent published estimates from similar or closely related 
species to those included in our dataset. For most of the species, these estimates 
lie within the 95% confidence intervals of our estimates. Published estimates 
are from: Felis catus (Wang et. al.”°), Mus musculus (Milholland et al.”?, Lindsay 
et al.°*), Pan troglodytes (Venn et al.”!, Tatsumoto et al.”, Besenbacher et al.’°), 
Homo sapiens (Conrad et al.’”*, Kong et al.®, Francioliet al.*, Rahbarietal.”’, 
Wong etal., Jonsson et al.””, Maretty et al.®?, Turner et al.”, Sasani et al.°°, 
Kessler et al.°’). The closely related species are from: close to the Salmo salar, 
Clupea harengus (Feng et al.*'), close to Paralichthys olivaceus, the Cichlid 


(Malinsky et al.!°), close to Canis lupus familiaris, Canis lupus (Kochetal.!), close 
to Capra hircus, Bos taurus (Harland et al.™), close to Mandrillus leucophaeus, 
Papio anubis (Wu et al.®), Macaca mulatta (Wang et al.*, Bergeronetal.”), and 
Chlorocebus sabaeus (Pfeifer'™), close to Saimiri boliviensis boliviensis, Aotus 
nancymaae (Thomas et al.”), close to Monodelphis domestica, Ornithorhynchus 
anatinus (Martin et al.*”), close to Taeniopygia guttata, Ficedula albicollis 
(Smeds et al.*). See also Supplementary Table 8. The silhouette of Sygnathus 
was created byJ.S. All other silhouettes are from PhyloPic (http://phylopic.org), 
except one of the silhouettes of S. harrissi, which was created by S. Werning, and 
the silhouette of P. troglodytes, which was created by T. M. Keesey (vectorization) 
and T. Hisgett (photography); both are available under a CC-BY 3.0 licence 
(https://creativecommons.org/licenses/by/3.0). 
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Extended Data Fig. 5| Germline mutation rates are associated with long- 
term substitution rates. This figure is similar to the main Fig. 2 but uses 
phylogenetic regression (PGLS) onalogscale. The grey dashed lines indicate 
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the per-year rates and the rates derived from Ultraconserved elements (UCEs) 
and their flanking sequences. b. However, this correlation is not significant 
when comparing the per-year rates with the rates derived from the whole 
genome alignments (WGAs). 
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maximum and minimum excluding outliers. 
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Extended Data Fig. 8 | The drift barrier hypothesis on different times and 
different mutation rate parameters used to estimate N, with phylogenetic 
regression (PGLS). a. The correlation between N, and the mutation rate per 
generation is not significant when using the most recent value before 30,000 
years estimated by PSMC. b. The relationship is also not significant when using 
the harmonic mean over a more recent period of time (30,000 years to 130,000 
years ago). However, this relationship is significant for mammals (adjusted 
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r’=0.104, p = 0.04). We used the harmonic mean over the past million years in 
the main text, as PSMC is not reliable over recent periods. c. When looking at the 
relationship between the mutation rate and N,, estimated using the pedigree- 
based mutation rate, we find a stronger signal over the past 1,000,000 years, 
probably due tothe circularity of this analysis. d. However, the relationship is 
still not significant when using the most recent time point or e. the average over 
the past 100,000 years. 
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The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement 


A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly 


The statistical test(s) used AND whether they are one- or two-sided 
Only common tests should be described solely by name; describe more complex techniques in the Methods section. 


A description of all covariates tested 
A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons 


A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) 
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For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes 


OOOO O OO O O 
K XKX K XK KXK K X K 


Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated 


Our web collection on statistics for biologists contains articles on many of the points above. 


Software and code 


Policy information about availability of computer code 


Data collection No software was used for data collection 


Data analysis The software used were: samtools 1.2 and 0.1.18, bcftools 1.2 and 1.9, bwa 0.7.15, Picard MarkDuplicates 2.7.1, R 3.5.1 and 4.0.2, IQTREE 
2.0.3, Phyluce, ANGSD 0.920, soapnuke 1.5.6, python 3.7.3, java 1.8.0_222, gatk 4.0.7.0. 
The entire pipeline used is available at https://github.com/lucieabergeron/vertebrate_rate. 


For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and 
reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Portfolio guidelines for submitting code & software for further information. 
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All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: 


- Accession codes, unique identifiers, or web links for publicly available datasets 
- A description of any restrictions on data availability 
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- For clinical datasets or third party data, please ensure that the statement adheres to our policy 


Whole-genome sequences of all species except humans are accessible in NCBI (National Center for Biotechnology Information) under the BioProject: PRINA767781 


(https://www.ncbi.nim.nih.gov/bioproject/767781). The human sequences are available upon request. The alignments for the UCE tree are available on Figshare 
(https://doi.org/10.6084/m9.figshare.19221693.v1). 
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Policy information about studies involving human research participants and Sex and Gender in Research. 


Reporting on sex and gender Not applicable 


Population characteristics Not applicable 
Recruitment Not applicable 
Ethics oversight Not applicable 


Note that full information on the approval of the study protocol must also be provided in the manuscript. 
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Field-specific reporting 


Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection. 
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For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf 


Ecological, evolutionary & environmental sciences study design 


All studies must disclose on these points even when the disclosure is negative. 


Study description This study estimates the germline mutation rates for 68 species of vertebrates using whole genome comparison of pedigrees. 
Research sample We analyzed 151 trios for 68 species of vetebrates, including birds, reptiles, fishes and mammals. 
Sampling strategy We collected trios (mother, father, offspring) for species with reference genome available. Our sample size for each species was 


limited by the pedigree available. 
Data collection Lucie A. Bergeron collected all the data, with the help of collaborators from zoo, research centers and farm. 


Timing and spatial scale Data were collected between October 2017 and December 2018. 


Data exclusions After data analysis, some of the samples were not as related as stated by the samples providers, therefore, they were excluded. 

Reproducibility All sequences and scripts are provided for reproducibility of our results. 

Randomization This is not relevant to our study, randomization was not possible as the sample size was small per species. 

Blinding Blinding was not relevant to our study as there were no expected data most of the rates were never estimated before from 
pedigrees. 
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